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Abstract 

The availability of efficient Krylov subspace solvers play a vital role for 
the solution of a variety of numerical problems in computational science. 
Here we consider lattice field theory. 

We present a new general numerical method to compute many Green's 
functions for complex non-singular matrices within one iteration process. 
Our procedure applies to matrices of structure A = D — with m pro- 
portional to the unit matrix, and can be integrated within any Krylov 
subspace solver. We can compute the derivatives x*^"-* of the solution vec- 
tor X with respect to the parameter m and construct the Taylor expansion 
of X around m. 

We demonstrate the advantages of our method using a minimal residual 
solver. Here the procedure requires 1 intermediate vector for each Green's 
function to compute. As real life example, we determine a mass trajectory 
of the Wilson fermion matrix for lattice QCD. Here we find that we can 
obtain Green's functions at all masses > m at the price of one inversion 
at mass m. 



1 Introduction 



Lattice Quantum Chromodynamics demands for enormous compute power and 
efficient numerical algorithms in order to solve huge sparse systems of linear 
equations. These linear systems originate from a discretized differential oper- 
ator, the fermionic matrix, with varying gluonic background fields acting as 
coefficient functions. The latter can be generated by Monte Carlo methods. 
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From the solutions, i.e. the Green's functions or propagators, one can deter- 
mine the properties of hadrons, as e.g. spectrum, weak decay constants or weak 
matrix elements ^, ^. 

Recently, in Ref. Q, it has been demonstrated that it can be very advantageous 
to exploit the structure of the Wilson fermion matrix M = m — D |5[, which 
is widely used in lattice calculations. In particular, the computation of so- 
called mass trajectories, i.e., multiple inversions of m — D for a set of 'bare 
masses'^] mo, mi, . . . , mjv with rrii < rrii+i, could be carried out within one 
iteration process, using the quasi minimum residual algorithm (QMR) [|, 0, ||. 
This useful property is a direct consequence of the non-symmetric Lanczos 
process where the Lanczos part had to be gone through only once as it 
depends solely on D. Therefore, the method presented in Ref. Q is restricted 
to QMR. Unfortunately the apphcation of the QMR-MULT algorithm |] is 
rather memory intensive: QMR itself needs 6 intermediate vectors of the size of 
one propagator component and requires to store 3 additional vectors for each of 
the members of the mass-trajectory. Computations of propagators as of today, 
however, tend to be memory bounded even on the largest distributed memory 
machines available. 

In this paper, we propose a new numerical approach to compute the entire mass 
trajectory simultaneously. This method is not restricted to a special solver such 
as QMR and requires only a modest additional amount of workspace. 

For definiteness and simplicity we present our idea as applied within the minimal 
residual algorithm (MR), one of the favorite standard solvers in lattice QCD 



that belongs to the class of the general conjugate residual methods |10|, MR 
is a GCR algorithm with recursion depth 0. 

Our key observation is that the derivatives x^"^ = (Px/dmJ^ of the solution x 
can be related recursively to the coefficients that occur within the iteration. 
Thus the Taylor expansion of the propagator x{m + Am) around m can be 
computed almost for free during the iteration process. 

We find numerically that the procedure renders x(mj > mo) automatically with 
higher accuracy than x(mo). The accuracy of the latter is determined as usual 
by the stopping residual's norm. This enables us to calculate the entire mass 
trajectory at once, provided that we compute at mo. On the other side, the 
procedure can provide good estimates for propagators with x(mj < mo) to be 
used as educated guesses for further inversions. 

We note that our method can be applied to the case of even-odd-preconditioning 



1 12]. For general sources, however, two separate inversions are necesssary. The 
additional expense of a factor of two easily pays off against large numbers of 
Green's functions to be computed. 



^The 'physical' bare masses differ from m by a constant shift, which is not relevant in this 
context 
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As a by-product we are in principle in the position to compute all derivatives 



up to order n = N, the number where the iteration stops. 



The paper is organized as follows: In section 2.1, we present the MR algorithm 



followed by the computation of the derivatives in section |2.2| . The construction 



of the Taylor series is explained in section 2.3. Furthermore we will present 
the results of numerical tests that deal with the inversion of the Wilson fermion 
matrix on a real full QCD configuration at /3 = 5.6 and k = 0.1575 on a 16^ x 32 
lattice. 



2 M^R: Multiple Masses Minimal Residual 



2.1 Standard Minimal Residual Algorithm 



Consider the linear system: 

Ax = [D — m)x = tq (1) 

with A being a complex nonsingular matrix. The search direction in the Min- 
imal Residual Algorithm is taken to be the direction of the residual. At every 
iteration step the following operation is performed: 

Xi+i = Xi + aiVi (2) 
rj+i = Ti- aiAvi. 

The coefficient Ui is determined by a minimization of the 2-norm of the new 
residual rj+i: 

_ {Ari)hi 



2.2 Computation of the Derivatives 



It is possible to extract in addition to the solution an arbitrary number of 
derivatives x^") = d^x{m) /dm^ at almost no extra computational cost, provided 
the matrix is of the special form of equation |^ and the source tq is independent 
of m. 

n-fold differentiation of eq. || with respect to m leads to a linear equation for 
the n-th derivative 

ylx(") - nx("-^) = 0. (4) 

Let Xi denote the approximation of the solution x after i iterations of the 
Minimal Residual algorithm. We are interested in a similar approximation for 



3 



the 1st derivative x^l\ Therefore we demand eq. ^ for n = 1 to hold at each 
iteration step: 



Ax. 



(1) 

i 



Xi 



(5) 



It is obvious that the approximants x\ , defined in this way, converge towards 
x^^^ as Xi approaches x. The next step is to calculate the recursion relation for 



.(1) 



To this end we make use of the recursion relation for Xi itself which is 



particularly simple in the case of the Minimal Residual algorithm: 



^(4+1 -xf) = {Xi+l - Xi) 



We then find the formal recursion relation 



ai(ro - Axi). 



(1) _ (1) _ / _ N 

^2+1 X^ — (XiyX X{j. 

At this stage we replace the true solution x by it's best approximation xn. 



(1) 



(1) 



N 



i-1 



J2 "i^i - 



i=o 



j=0 



(6) 
(7) 

(8) 



A summation over i then leads to the following expression for x^^^: 

„(i) 



TV TV 
i=0 j=i 



(9) 



It is important to start the inversion with an initial guess xq = since the non- 
singularity of A then implies that Xq^^ vanishes as we ll|. After a resummation 
of the r.h.s we find 



■'N+l 



N 



1=0 



j=0 



(10) 



Note that we can recursively build up the r.h.s. of this equation in terms of the 
coefficients 



(1) 

j=0 

at the price of only one extra vector-update per iteration step 



(1) , (1) 



(11) 



(12) 



One can extend this procedure to higher derivatives by use of the following 
recursion relation for the generalized coefficients: 

af+^)=a,^4"). (13) 

i=o 

nontrivial starting guess for xo would require a starting guess for Xq^' computed from 
Ax'^^ = Xo at the price of an extra inversion 
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In this way one ends up with the general updating-step for the derivatives 

We remark that for every additional derivative only one extra vector of storage 
is necessary. 

2.3 Recombination of the Taylor-series 

Usually one is interested in the solution of eq. Q for a given set of different values 
of m. In this section we will show that it is possible to perform a summation 
over all available approximate derivatives of x{m) and hence obtain a Taylor 
extrapolation to a different value of the parameter m. 

We would like to perform a summation of the N-truncated Taylor series to an 
extrapolated solution = x{m + Am): 

x^ = f^^.("). (15) 



This Taylor series can be constructed recursively by use of equation 14 



N 

f+1 = xf + J2 (Am)-aS"V, := xf + f,a,r„ (16) 

n=0 



where we have introduced scale-factors 



1 ^ 

/. = 1 + -E^"^"«S"^- (17) 

n=l 



In order to compute these scale-factors fi we recast eq. 13 in recursive form: 

(18) 



(n) tti (n) , (n-l) 



Inserted into equation |17| this yields: 

h = 
fo = 



i-l 



1 — Amoj 
1 

1 — Amao 



(19) 



We close with the comment that the entire procedure is selfconsistent up to 
terms of 0{m^^^) and 0{Am^^^). 
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2.4 Template for the M^i?-Algorithm 



Next we write down a template for our Multiple Masses Minimal Residual 
(M^i?) algorithm. For shortness we apply one extrapolated solution x^(Am), 
more solutions can be included easily. 



Initialize: x = 
x^ = 
r = $ 
/=1 

Repeat: p = Ar 

rv, — / , (P£l 

Q. U/T 7 

(P-P) 



1—Ama 

X = X + ar 

„E _ 



X = X + far 
r = r — ap 

check convergence, continue if necessary 



Note that the standard overrelaxation parameter to can be included as usual. 



3 Applications in Lattice QCD 



3.1 The Wilson Fermion Matrix 



In 4 dimensional lattice Quantum Chromodynamics the Wilson fermion opera- 
tor has the structure: 

M = m-D (20) 

with m being the bare-mass and 

4 

+ {l + l^.)Ul{x-p:)5,,^y+^ (21) 

the off-diagonal hopping term. The coefficients U^{x) are SU(3) background 
gauge fields. Greens Functions in QCD are the solutions of the linear equation 

Mx = (22) 



with <I> being a (5-source or a 'smeared' source [13 
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3.2 Results 



The numerical tests were performed on the Quadrics Q4 parallel computer at 
Wuppertal University and the QH2 at IfH/DESY Zeuthen. We test the M^R 
algorithm on a full-QCD configuration of size 16^ X 32 at /3 = 5.6 and the 
following mass trajectory: 

K = — = 0.1575, 0.1570, 0.1565, 0.1555, 0.1530, 0.1400, 0.1000. (23) 
m 



We calculated the propagator on the lightest mass, corresponding to k = 0.1575 
and extrapolated with the M^R - technique to the six heavier masses. Our 
stopping criterion was 



< 10 



-5 



(24) 



with ||.|| denoting the 2-norm and being the true residual ||M(m + Am)xi — 
ro||. In figure ^ the true residuals of the 7 inversions against the number of 
M'^R iterations are shown. It should be noted that the true residuals of the 
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Figure 1: True residuals of 7 mass- values against the number of M'^R- iterations. 



extrapolated heavier masses are always smaller than those corresponding to the 
lightest mass, for which the MR is actually inverting. We even observe a faster 
convergence with increasing Am, despite the fact that we used a truncated 
Taylor-expansion to perform the extrapolation to larger masses. All residuals 
reach a plateau slightly below 10~^, which refiects the 32bit precision of the 
Quadrics computer. 
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The gain factor for our example on a 256-node QH2 is a factor of 3 in CPU- 
time compared to seperate inversions at all the masses of the trajectory. This 
gain factor, of course, depends on the number of Green's functions that fit into 
memory and the mass range of the trajectory. 



3.3 Even-Odd Preconditioning 



In lattice quantum chromodynamics it has become standard to use the so 
called even-odd or red-black ordering scheme for the lattice-sites. This ordering 
scheme induces a block structure of the fermion matrix, which can be exploited 
to improve its condition number. 




If one denotes the even lattice sites with Xe, the odd sites with equation ^ 
reads 

Right preconditioning with the matrix 

M'=( ^'"] (26) 

y Doe m J 

yields two seperate matrix equations, each with a smaller condition number 
resulting in a performance gain of a factor 2-3: 



- DpoDoe 
- DopDeo 





(27) 



The vector y can be transformed back to x using one additional matrix multi- 
plication: 

X = M'y. (28) 

The M^R-Template can be applied as shown above to the even-odd precondi- 
tioned system using m? as extrapolation parameter. 

We test the preconditioned version of the algorithm on the same configuration, 
which has been reordered into even-odd form. We prefer to use a (^-source on an 
even lattice-point, because it enables us to restrict the application to the upper 
half of the system saving a factor of two in CPU time. Using the trajectory 
of seven masses from section we obtain the results shown in fig. ^. The 
behaviour remains qualitatively the same. Due to the smaller condition number 
of the matrix the convergence speed increases by approximately a factor of 2. 

The gain factor compared to a sequential MR inversion on every mass is equal 
to 3. For a general source, however, one has to carry out the inversion on both 



systems of equation 27 reducing the above gain factor for our example to 1.5 . 
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Figure 2: True residuals of 7 mass- values against the number of M'^R-iterations 
(with even-odd preconditioning). 



4 Summary and Outlook 



We have presented a new numerical method to compute a trajectory of solutions 
X, of the linear system Ax = r exploiting the shifted structure A = m — D of 
the linear system. We have shown that in a realistic setup in the framework 
of lattice Quantum Chromodynamics the method appears to be numerically 
stable as well as more efficient than standard methods. 



For our setup of masses the gain factor varies between 1.5 and 3 depending on 
the source. Furthermore it depends strongly on the set of masses to compute. 
Note that we have chosen a large range of masses to demonstrate the conver- 
gence properties of the algorithm. In a realistic QCD setup the gain factor 
would be higher. This holds in particular for situations where one would like 
to make use of the Feynman-Hellmann theorem with respect to the quark mass 
dependence as in the case of the Pion-Nucleon-Sigma term [14| in Quenched 
QCD. 



The method can be generalized in many different directions: In principle it can 
be embedded into any known iterative solver like Conjugate Gradient (CG) or 
Generalized Conjugate Residual (GCR). Furthermore it seems possible to de- 
rive similar algorithms for matrices with more complex parameter-dependencies 
than the one discussed in this paper. 
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